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Abstract 

Recent developments in decoding Tanner codes with maximum-likelihood certificates 
are based on a sufficient condition called local-optimality. We define hierarchies of locally 
optimal codewords with respect to two parameters. One parameter is related to the distance 
of the local codes in Tanner codes. The second parameter is related to the finite number 
of iterations used in iterative decoding. We show that these hierarchies satisfy inclusion 
properties as these parameters are increased. In particular, this implies that a codeword that 
is decoded with a certificate using an iterative decoder after h iterations is decoded with a 
certificate after k ■ h iterations, for every integer k. 

1 Introduction 

Local optimality is often used as a sufficient condition for successful decoding of finite length 
codes (see e.g., HWJWOSi IAD S09I). In this work we focus on two parameters of the local- 
optimality characterization for Tanner codes HHEllH . The first parameter is related to the dis- 
tance of the local codes in (expander) Tanner codes. The second parameter is related to the 
finite number of iterations used in iterative decoding even beyond the girth. We define hierar- 
chies of local optimality with respect to these parameters. These hierarchies provide a partial 
explanation of two questions about successful decoding with ML-certificates: (1) What is the 
effect of increasing the distance of the local codes in Tanner codes? (2) What is the effect of 
increasing the number of iterations beyond the girth in iterative decoding? 

Previous Work: Density Evolution (DE) is used to study the asymptotic performance of 
decoding algorithms based on Belief-Propagation (BP) (see e.g., [|RU01[|CF02II ). Convergence 
of BP-based decoding algorithms was studied in IIFKOOi IWFOli IWJW051 iJPTTll . Note that 
convergence guaranties do not imply successful decoding after a finite number of iterations. 
Korada and Urbanke UKUllll provide an asymptotic analysis of iterative decoding "beyond" 
the girth. Specifically, they prove that one may exchange the order of the limits in DE-analysis 
of BP-decoding under certain conditions (i.e., variable node degree at least 5 and bounded 
LLRs). On the other hand, our work focuses on iterative decoding of finite length codes using 
a finite number of iterations. 
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Suboptimal decoding of expander Tanner codes was analyzed in many works (see IISS961 
IBZ04[|FS05II ). The results in these analyses rely on: (i) the expansion properties of the Tanner 
graph, and (ii) constant relative distances of the local codes. The error-correcting guaranties in 
these analyses improve as the relative distance increases. 

A new local-optimality characterization for a codeword in a Tanner code w.r.t. any MBIOS 
channel was presented in HHEllH . A locally-optimal codeword is guaranteed to be both the 
unique maximum-likelihood (ML) codeword as well as the unique LP-decoding codeword. 
The characterization of local-optimality for Tanner codes has three parameters: (i) a height 
/i G IN, (ii) level weights w E R'j:, and (iii) a degree 2 ^ d ^ d*, where d* is the minimum 
local distance. 

A new message-passing decoding algorithm, called normalized weighted min-sum (NWMS), 
was presented for Tanner codes with single parity-check (SPC) local-codes HHEllH . The NWMS 
decoder is guarantied to compute the ML-codeword in h iterations provided that a locally- 
optimal codeword with height h exists. The number of iterations h may exceed the girth of the 
Tanner graph. 

Contribution: We present a variation of local-optimality called strong local-optimality. We 
prove that if a codeword is strongly locally-optimal, then it also locally-optimal. Hence, previ- 
ous results proved for local-optimality [HEllJ hold also for strong local-optimality. 

We present two hierarchies: (1) A hierarchy of local-optimality based on degrees. The 
degree hierarchy states that a locally optimal codeword x with degree parameter d is also 
locally-optimal with respect to any degree parameter d' > d. The degree hierarchy implies 
that the occurrence of local-optimality does not decrease as the degree parameter increases. 
(2) A hierarchy of strong local-optimality based on height. The height hierarchy states that 
if a codeword x is strongly locally-optimal with respect to height parameter h, then it is also 
strongly locally-optimal with respect to every height parameter that is an integer multiple of 
h. The height hierarchy proves, for example, that the performance of iterative decoding with 
an ML-certificate (e.g., NWMS) of finite-length Tanner codes with SPC local-codes does not 
degrade as the number of iterations grows, even beyond the girth of the Tanner graph. 

Organization. In Section [3] we introduce a key procedure used in the proof of the presented 
hierarchies. In Section|4]we prove that the degree-based hierarchy induces a chain of inclusions 
of locally optimal codewords and LLRs. In Section [5] we prove a height-based hierarchy over 
strong local-optimality. We show that strong local-optimality implies local-optimality. Numer- 
ical results of strong local-optimality and local-optimality with respect to the height hierarchy 
are presented in Section[6l We conclude with a discussion in Section|71 

2 Preliminaries 

Graph Terminology. Let G = {V,E) denote an undirected graph. Let Mciv) denote the 
set of neighbors of node v G V, and let degQ{v) = IMciv)] denote the degree of node v in 
graph G. A path p = (t>, . . . , m) in G is a sequence of vertices such that there exists an edge 
between every two consecutive nodes in the sequence p. A path p is backtrackless if every two 
consecutive edges along p do not close a cycle. Let \p\ denote the number of edges in p. Let 
girth (G) denote the length of the shortest cycle in G. 
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Tanner-codes. Let G = (V U J',E) denote an edge-labeled bipartite-graph, where V = 
{f 1, . . . , vn} is a set of N vertices called variable nodes, and J = {Ci, . . . , Cj} is a set of J 
vertices called local-code nodes. We associate with each local-code node Cj a linear code C'' 
of length degg.(Cj ). Let C"^ = {C'' : 1 ^ j ^ j} denote the set of local-codes, one for each 

local code node. We say that Vi participates in C'' if (fj, Cj) is an edge in E. 

A word ) G {0, 1}^ as an assignment to variable nodes in V where Xj 

is assigned to Vi. The Tanner code C{G, C ) based on the labeled Tanner graph G is the set 
of vectors x G {0, 1}^ such that the projection of x onto entries associated with J\fc{Cj) is a 

codeword in C"' for every j G {1, . . . , J}. Let rfj denote the minimum distance of the local code 

j ^ ^ 

C\ The minimum local distance d* of a Tanner code C(G, C ) is defined by d* = min^ dj. We 
assume that d* > 2. 

If the bipartite graph is {di, (iij) -regular, then the graph defines a (dL, dnj-regular Tanner 
code. If the Tanner graph is sparse, i.e., \E\ = 0{N), then it defines a low-density Tanner 
code. Tanner codes with single parity check (SPC) local-codes that are based on sparse Tanner 
graphs are called low-density parity-check (LDPC) codes. 

Communicating over memoryless channels. Let Cj G {0, 1} denote the ith transmitted bi- 
nary symbol (channel input), and let G R. denote the ith received symbol (channel output). A 
memoryless binary-input output-symmetric (MBIOS) channel is defined by a conditional proba- 
bility density function /(?/j|cj = a)fora G {0, 1}, that satisfies /(yj|0) = /(— ?/j|l). In MBIOS 
channels, the log-likelihood ratio (LLR) vector A G is defined by Xi{yi) = In (jl^^jf^^jy) 
for every input bit i. For a code C, Maximum-Likelihood (ML) decoding is equivalent to 

Local-Optimality Characterization. A new characterization for local-optimality of Tanner 
codes was presented in fiHEllH as extension to [|ADS09[ IVonlOH . Local-optimality is defined 
in Definition m 

Definition 1 (Path-Prefix Tree). Consider a graph G = (V, E) and a node r & V. Let V 
denote the set of all backtrackless paths in G with length at most h that start at node r, and let 
E = { (pi, P2) G X V" I pi is a prefix of p2, |pi | + 1 = IP2I }• We identify the empty path in V 
with (r). Denote by Tj^{G) = (V, E) the path-prefix tree ofC rooted at node r with height h. 

Path prefix trees of G that are rooted in variable nodes are often called computation trees. 

We use the following notation. Because vertices in Tj^{G) are paths in G, we denote vertices 
in path-prefix trees by p and q. Vertices in G are denoted hy u,v,r. For a path p E V, let t(p) 
denote the last vertex (target) of path p. Denote by Prefix^ (p) the set of proper prefixes of the 
path p, i.e., 

Prefix^(]9) = |g | g is a prefix of p, 1 ^|q'|< \p\}- 

Let Tj'iG) = (V,E) denote a path-prefix tree of a Tanner graph G = (V U J,E). Let 
V = {p \ p E V, t(p) G V}, and J = {p \ p E V, t(p) E J}. Paths in V are called variable 
paths, and paths in J are called local-code paths. 

Definition 2 (rf-tree). Denote by T^^(G) = (V U JT", E) the path-prefix tree of a Tanner graph 
G rooted at node r E V. A subtree T C T^^(G) is a rf-tree if: (i) T is rooted at (r), (ii)for 
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every local-code path p E T H j', deg-j-^p) = d, and (in) for every variable path p G T fl V, 
deg7-(p) = degj-2h{p). 

Let T[r, 2h, d]{G) denote the set of all cZ-trees rooted at r that are subtrees of T^^^^G). 

Definition 3 (w-weighted subtree). Let T = (V U JT", -E) denote a subtree of'T^'^{G), and let 
w = {wi, . . . , Wh) € nil \ {0^} denote a non-negative weight vector Let wj- : V — )■ R denote 
a weight function based on weight vector w for variable paths p G V defined as follows. Ifp is 
an empty variable path, then Wfip) = 0. Otherwise, 

where i = . We refer to wj- as a w-weighted subtree. 

For any w-weighted subtree wj- of 'T^^{G), let t^g,t,w : V — )■ R denote a function whose 
values correspond to the projection of w-j- to the Tanner graph G. That is, for every variable 
node V in G, 

T^G,T,w{^') - Wt{p). (2) 

{p(iT\t{p)=v} 

For a Tanner code C{G), let C [0, 1]^ denote the set of all projections of w-weighted 
d-trees to G. That is, 

Bf^ ^ {TTG.r,. I r G U r[r, d]{G)]. (3) 

rev 

Vectors in B^^'' are referred to as deviations. For two vectors x G {0, 1}^ and / G [0, 1]^, let 
X © / G [0, 1]^ denote the relative point defined by (x © /)j = \xi — fi\ HFelOBL 



Definition 4 (local-optimality, HHEllH ). A codeword x G C{G) is (/;,, w, c?) -locally optimal 

{X,x(Bf3)> {X,x). (4) 



with respect to A G R^ if for all vectors (3 G B^^^ 



Theorem 5 (local-optimality is sufficient for ML and LP, HHEUH ). Let X G R^ denote the 
LLR vector received from the channel. Ifx is an {h, w, d)-locally optimal codeword w.r.t. X and 
some 2 ^ d ^ d*, then (1) x is the unique maximum-likelihood codeword w.r.t. X, and (2) x is 
the unique optimal solution of the LP-decoder given X. 

For two vectors y, z E R^, let "*" denote coordinatewise multiplication, i.e., y * z = 
{yi- zi,. . . ,yN ■ Zn). 

Proposition 6 dlHETTTl ). For every A G R^ and every f3 G [0, 1]^, 

((-1)"* A,/3) = (A,x©/3) - (A,x). 

The following proposition states that the mapping (x, A) i— )■ (0^, (—1)^ * A) preserves 
local-optimality. 

Proposition 7 (symmetry of local-optimality, HHEllH ). For every x E C, x is (h, w, d)-locally 
optimal for X if and only ifO^ is {h, w, d)-locally optimal for {—lY * ^• 
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Trim(T7g) Tq 



Figure 1 : Trimmed tree of T induced by q. 

3 Trimming Subtrees from a Path-Prefix Tree 

Let Tq denote the subtree of a path-prefix tree T hanging from path q, i.e., the subtree induced 
hy Vq = {p e V U J \ q e Prefix+(p) oi p = q] (see Figure [T]). Let Trim(T, g) denote 
the trimmed-tree of T induced by q obtained by deleting the subtree Tq from T. Formally, 
Trim(T, q) is the path-prefix subtree of T induced hy V J \ Vq. Note that if q' is a sibling of 
q (i.e., q' differs from q only in the last edge), then the degree of the parent of q and q' decreases 
as a result of trimming Vq. Hence, wr{q") < winmir ,q){(l") for every variable path q" G Vq>. 

The proofs of hierarchies presented in the following sections are based on the following 
lemma. 

Lemma 8. Let T denote a subtree of a path-prefix tree T.^^{G). For every path p eT with at 
least two children in T, there exists at least one child q ofp, such that 

Proof. See Appendix lAl □ 

4 Degree Hierarchy of Local-Optimahty 

Let A C denote a set of vectors. Denote by L0c_a(^5 ui, d) the set of pairs (x, A) G C x A 
such that X is (/i, w, (i)-locally optimal w.r.t. A. Formally, 

L0c,a(^, If, (i) = {(a;, A) G C X a I X is w, rf)— locally optimal w.r.t. A}. (5) 

The following theorem derives an hierarchy on the "density" of deviations in local-optimality 
characterization. 

Theorem 9 (rf-Hierarchy of local-optimality). Let 2 ^ d < d*. For every A C R-^, 

LOc,a(^, W, d) C LOc,A(/i, w,d+ 1). 



Proof. We prove the contrapositive statement. Assume that x is not {h, w, (i+l)-locally optimal 
w.r.t. A. By Proposition |71 0^ is not {h, w,d + l)-locally optimal w.r.t. A° = (—1)^ * A. Hence, 

there exists a deviation /3 = t^g,t,w ^ B^d"* ^'^'-^ (-^°' P) ^ 0- Let T denote the {d + l)-tree 
that corresponds to the deviation (3. 

Consider the following iterative trimming process. Start with the {d + l)-tree T and let 
T 4- T'; While there exists a local-code path p E T' such that (leg^,{p) = d + 1 Ao: T' ^ 
Trim(T', q) where g is a child of p such that (A°, T^G,r',w) ^ T^G,TriTa(T' ,q),w) ■ 

Lemma [8] guarantees that the iterative trimming process halts with a d-tree T' whose corre- 
sponding deviation = 'Kg,T',w satisfies {\^,(3') ^ (A°,/3) ^ 0. We conclude by Proposition|7] 
that X is not {h, w, (i)-locally optimal w.r.t. A, as required. □ 

We therefore have for every 2 ^ d < d*, 
Ptx[x is {h, w,d+ 1)— locally optimal w.r.t. A} ^ 



5 Height Hierarchy of Strong Local-Optimality 

In this section we introduce a new combinatorial characterization named strong local-optimality . 
We prove that if a codeword is strongly locally-optimal then it is also locally-optimal. The other 
direction is not true in general. 

Definition 10 (reduced (i-tree). Denote by %^^ {G) = (V U J,E) the path-prefix tree of a 
Tanner graph G rooted at node r G V. A subtree T C T^^(G) is a reduced d-tree if: (i) T 
is rooted at r, (ii) degj- ((r)) = deg(j(r) — 1, (Hi) for every local-code path p E T Ci J', 
deg7-(p) = d, and ( iv)for every non-empty variable path p E T HV, deg'j-{p) = degj-2h (p). 

The only difference between Definition |2] ((i-tree) to a reduced d-tree is that the degree of 
the root in a reduced d-tree is smaller by 1 (as if the root itself hangs from an edge). 

Let T'^^'^lr, 2h, d](G) denote the set of all reduced d-trees rooted at r that are subtrees of 
T^^iG). For a Tanner code C{G), let B^^"^ C [0, 1]^ denote the set of all projections of w- 
weighted reduced rf-trees to G. That is, 



Vectors in are referred to as reduced deviations. 

The following definition is analogues to Definition |4] (local-optimality) using reduced devi- 
ations instead of deviations. 

Definition 11 (strong local-optimality). Let C{G) C {0, 1}^ denote a Tanner code with min- 
imum local distance d*. Let w E R^\{0^ } denote a non-negative weight vector of length h 
and let 2 ^ d ^ d*. A codeword x E C{G) is (h, w, (i)-strong locally-optimal with respect to 



Ptx{x is {h, w, (i)— locally optimal w.r.t. A}. 




(6) 



A G R- 



if for all vectors /3 E B^ 




(A,x © l3) > (A,x). 



(7) 
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Denote by SLOc,A{h,w,d) the set pairs (x, A) G C x A such that x is {h,w,d) -strong 
locally-optimal w.r.t. A. Formally, 

SLOc,A{h,w, d) = {(x, A) G C X A I X is (/i, w, (i)—stronglocally — optimal w.r.t. A}. (8) 

The following lemma states that if a codeword x is strongly locally-optimal w.r.t. A, then x 
is locally-optimal w.r.t. A. 

Lemma 12. For every A C E^, 

SLOc,A{h,W,d) C LOc,A{h,W,d). 

Proof. We prove the contrapositive statement. Assume that x is not {h, w, (i)-locally optimal 
w.r.t. A. By Proposition |71 0^ is not (/i, w, (i)-locally optimal w.r.t. A° = (— 1)"^ * A. Hence, 
there exists a deviation /3 = 'nG,T,w ^ ^^d^ such that (A°, (3) ^ 0. Let T denote the d-tree that 
corresponds to the deviation /3. 

Denote by (r) the root of T. By Lemma[8l the root (r) has a child q such that (A°, TiG,T,w) ^ 
(A°, vTc TrimCr.g),™)- Notc that Trim(T, g) is a reduced d-tree rooted at r. Moreover, the cor- 
responding reduced deviation = hg,T',w satisfies (A°, /?') ^ (A°, (3) ^ 0. We conclude by 
Proposition |7] that x is not (/i, w, (i)-strong locally-optimal w.r.t. A, as required. □ 

Following Lemma [T2l and Theorem [5] we have the following corollary. 

Corollary 13 (strong local-optimality is sufficient for both ML and LP). Let C{G) denote a 
Tanner code with minimum local distance d*. Let h G 1N+ and w G R^ji. Let A G denote 
the LLR vector received from the channel. Ifx is an {h, w, d)-strong locally-optimal codeword 
w.r.t. A and some 2 ^ d ^ d*, then (1) x is the unique maximum-likelihood codeword w.r.t. A, 
and (2) X is the unique solution of LP-decoding given A. 

Consider a weight vector w G R'^ '^, and letw = ow'^ o . . . ow'' denote its decomposition 
to k weight vectors G R'^. w G R^'^ is a k-legal extension of w G R'^ if 3« G R'^ such 
that = Ui- w. Note that if w G R'^ is geometric, then it is a /c-legal extension of in its 
decomposition. 

The following theorem derives an hierarchy on the height of reduced deviations of strong 
local-optimality characterization. 

Theorem 14 (/i-Hierarchy of strong LO). For every A C R^, ifw g R'^ '* is a k-legal extension 
ofwe R^, then 

SLOc,a(^, w, d) C slOc,a(A; ■ h, w, d). 

Proof. We prove the contrapositive statement. Assume that x is not {k ■ h, w, (i)-strong locally- 
optimal w.r.t. A. Proposition [6] implies that 0^ is not (k ■ h, w, (i)-strong locally-optimal w.r.t. 
A° = (—1)'^ * A. Hence, there exists a reduced deviation /3 = 7Tg,t,w ^ ^d"^ such that (A°, /3) ^ 
0. Let T denote the reduced d-tree that corresponds to the reduced deviation 13. 

Let {T*} denote a decomposition of T to reduced d-trees of height 2h as shown in Figure|2l 
where leaves of a subtree are the roots of other subtrees. Let denote the root of a reduced 
d-tree T* in the decomposition of T. Let order (T*) = //'^J- Namely, the order of T* 
equals to its level in the decomposition. Note that 

{r»} 
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2-k-h 



Figure 2: Decomposition of a reduced d-tree T of height 2kh to a set of subtrees {T*} that are 
reduced rf-trees of height 2h. 



Because (A°, (3) ^ 0, we conclude by averaging that there exists at least one reduced (i-tree 
T* G {T*} of height 2h such that (A°, 7rGr,r*,«)) ^ 0. Hence, 0^ is not {h, w, (i)-strong locally- 
optimal w.r.t. A°. We apply Proposition [6] again, and conclude that x is not w, ci) -strong 
locally-optimal w.r.t. A, as required. □ 

6 Numerical Results 

We conducted simulations to demonstrate two phenomena. First, we checked the gap between 
strong local optimality and local optimality. Second, we checked the effect of increasing the 
number of iterations on successful decoding with ML-certificates. 

We chose a (3, 6)-regular LDPC code with blocklength N = 1008 and girth g = 6 llMacll . 
We simulated a set Ap of 5000 LLR vectors corresponding to the all zeros codeword with 
respect to a BSC with crossover probability p E {0.04, 0.05, 0.06}. We used unit level weights, 
i.e., w = l'^. 

Let SLOoiv,Ap(/i, w, 2) (resp., LOoiv,Ap(^) w;, 2) ) denote the set of LLR vectors A G Ap such 
that 0^ is strongly locally-optimal (resp., locally optimal) w.r.t. A. 

Figure [3] depicts cardinality of SLOoiv,Ap(^; 2) and LOoiv,Ap(^; w;, 2) as a function of h, 
for three values of p. The results suggest that, in this setting, the sets SLO{oiV},Ap(^! ^! 2) and 
LOjoJVj Ap(^! w, 2) coincide as h grows. This suggests also that the containment in Lemma [T2] 
is asymptotically tight. That is, for large height h, strong local optimality is very close to 
local-optimality. 

The results also suggest that the number of iterations needed to obtain reasonable decoding 
with ML-certificates is far greater than the girth. Clearly, the "tree property" that DE analysis 
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Figure 3: Growth of strong local-optimality and local-optimality as a function of the height h. 

\Ap\ = 5000 for p e {0.04, 0.05, 0.06}. 



relies on does not hold for so many iterations. Indeed, the simulated crossover probabilities are 
in the "waterfall" region of the word error rate curve with respect to NWMS. We are not aware 
of any analytic explanation of this phenomena in finite length codes. 

Another result of the simulation is that SLOoiVAp (^5 w;, 2) C slo^n ft^^{h + l,w,2). Namely, 
once a codeword is strongly locally-optimal for A with height h, then it is also strongly locally- 
optimal for any height h' > h. This exhibited strengthening of the height hierarchy result is not 
true in general. Counterexamples can be obtained for other level weights w and Tanner codes. 

7 Discussion 

The degree hierarchy supports the improvement in the lower bounds for the threshold of the 
crossover probability p of a BSCp as a function of d (see HHElll Theorem 27]). These lower 
bounds are proved by analyzing the probability of a locally optimal codeword as a function of p 
and the degree d. For example, consider any (2, 16)-regular Tanner code with minimum local- 
distance 4 whose Tanner graph has logarithmic girth in the blocklength. The bounds in HHEllll 
imply a lower bound on the threshold of = 0.019 with respect to degree d = 3. On the other 
hand, the lower bound on the threshold increases to po = 0.044 with respect to degree d = A. 

The height hierarchy implies that if a codeword x is [h, w, 2)-strong locally-optimal w.r.t. 
an LLR vector A, then it is also strongly locally-optimal with respect to any legal extension of 
level weights w with larger height h'. 

Consider a Tanner code with single parity-check local-codes. Assume that x is strongly 
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locally-optimal codeword w.r.t. A based on a height parameter h. Because strong local- 
optimality implies local-optimality, following HHEll '. Theorem 16], we conclude that iterative 
message-passing decoding by NWMS is guaranteed to decode the ML-certified codeword x af- 
ter k ■ h iterations, for every k E ]N_|_. This gives the following new insight of convergence. If 
a codeword x is decoded after h iterations and is certified to be strongly locally-optimal (and 
hence ML-optimal), then x is the outcome of NWMS infinitely many times (i.e., whenever the 
number of iterations is a multiple of h). 



8 Conclusion 

We present hierarchies of local optimality with respect to two parameters of the local-optimality 
characterization for Tanner codes HHEllH . One hierarchy is based on the local code node 
degrees in the deviations. We prove containment, namely, the set of locally optimal codewords 
with respect to degree + 1 is a superset of the set of locally optimal codewords with respect 
to degree d. 

The second hierarchy is based on the height of the deviations. We prove that, for geometric 
level weights, a strongly locally optimal codeword is infinitely often strongly locally optimal. 
This result implies that a codeword that is decoded with a certificate using the iterative decoder 
NWMS after h iterations is decoded with a certificate after k ■ h iterations, for every integer k. 



A Proof of Lemma IS 

Proof. Consider a path p E T. Then, 

(A, TrG,T,w) = ■ ^r(9) + ■ ^r(9) • (9) 

<jev\V', qevnVq 



(a) (b) 

We deal with terms (a) and (b) in Equation ^ separately. 

First we deal with term (a). Let q' E M-j-^p), \q'\ = \p\ + 1, denote a child of p. Because 
p ^ Prefix^ (g) for the paths accumulated in term (a), it holds that 

Y ^t{q)-Wriq)= Y ^t{q) ■ WTnmiT,q'){q) (10) 
q€V\Vq q€V\Vq 

Hence, term (a) remains unchanged under trimming children of p from T. 

It remains to show that there exists a child q^ of p whose trimming does not increase term 
(b). Let cost7-(7^) = J2q&Vg ^t{q)Wriq) denote the cost of Tq with respect to T. Note that term 
(b) equals to costr(7^), and 

costriTp) = \t{p)Wrip) + costr(7^). (11) 

{qeAfrip) ■■ k|=|p|+i} 

Consider two children qi and q2 of p. By Definition [3l for every variable path q E Tq^, 
wriq) = {S7|Hy^Trim(r,gi)(g)- Hence, 

costriTq,) = (deg^(p) _ ^^ costTrim(r,gi)(rgJ. (12) 
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Let g™" = argmm{cost7-(7^) | q G J\fr{p), \q\ = \p\ + !}• Namely, g™™ is a child of p, 
for which the subtree hanging from it has a minimum cost. From Equations (fTTI) and (fT2l) . it 
follows by averaging that cost7-(7^) ^ costxrim(r,-j™*")(7p)- Hence, trimming the subtree that 
hangs from g™™ decreases term (b) in Equation Q, and the lemma follows. □ 
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